AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Cross-modal understanding

# Cross-modal understanding

Qwen2.5 Omni 7B GGUF
Other
Qwen2.5-Omni-7B-GGUF is the GGUF format version of the Qwen2.5-Omni-7B model, supporting multimodal inputs including text, audio, and images.
Large Language Model English
Q
ggml-org
319
3
VITA 1.5
VITA-1.5 is a multimodal interaction model designed to achieve GPT-4o level real-time vision and voice interaction capabilities.
Safetensors
V
VITA-MLLM
345
40
CSUMLM
Apache-2.0
CSUMLM is a cutting-edge artificial intelligence system that integrates the advantages of multimodal AI engines and large language models, featuring multimodal processing, complex language understanding, and real-time learning capabilities.
Multimodal Fusion Transformers Supports Multiple Languages
C
Or4cl3-1
35
1
Veld Base
Apache-2.0
Pre-trained visual encoder-text decoder model supporting Korean and English
Image-to-Text Transformers Supports Multiple Languages
V
KETI-AIR
40
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase